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Abstract 

A secret sharing scheme is a method to store information securely and reliably. Particularly, in a 
threshold secret sharing scheme, a secret is encoded into n shares, such that any set of at least ti shares 
suffice to decode the secret, and any set of at most £2 < ti shares reveal no information about the secret. 
Assuming that each party holds a share and a user wishes to decode the secret by receiving information 
from a set of parties; the question we study is how to minimize the amount of communication between the 
user and the parties. We show that the necessary amount of communication, termed “decoding bandwidth”, 
decreases as the number of parties that participate in decoding increases. We prove a tight lower bound on 
the decoding bandwidth, and construct secret sharing schemes achieving the bound. Particularly, we design 
a scheme that achieves the optimal decoding bandwidth when d parties participate in decoding, universally 
for all t\ < d < n. The scheme is based on Shamir’s secret sharing scheme and preserves its simplicity and 
efficiency. In addition, we consider secure distributed storage where the proposed communication efficient 
secret sharing schemes further improve disk access complexity during decoding. 

Index Terms 

Security, secret sharing, communication bandwidth, distributed storage, Reed-Solomon codes. 

I. Introduction 

Consider the scenario that n parties wish to store a secret securely and reliably. To this end, a dealer 
distributes the secret into n shares, i.e., one share for each party, such that 1) (reliability) a collection A 
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of “authorized” subsets of the parties can decode the secret, and 2) (secrecy) a collection B of “blocked" 
subsets of the parties cannot collude to deduce any information about the secret. A scheme to distribute the 
secret into shares with respect to access structure (A, B) is called a secret sharing scheme, initially studied 
in the seminal works by Shamir [14] and Blakley [3]. A secret sharing scheme is perfect if a subset of 
parties is either authorized or blocked, i.e., A IJ B = 2^ The scheme is referred to as a ramp scheme 
if it is not prefect. Besides its application in distributed storage of secret data, secret sharing became a 
fundamental cryptographic primitive and is used as a building block in numerous secure protocols [1], 
We focus on secret sharing schemes for the threshold access structure, i.e., A contains all subsets of 
{1, ..., n} of size at least n — r, and B contains all subsets of {1,..., n} of size at most 2. In other words, 
the secret can be decoded in the absence of any r parties, and any z parties cannot collude to deduce 
any information about the secret. The threshold access structure is particularly important in practice, 
because for this case, space and computationally efficient secret sharing schemes are known. Specifically, 
Shamir [14] constructs an elegant and efficient perfect threshold scheme using the idea of polynomial 
interpolation. Shamir’s scheme is later shown to be closely related to Reed-Solomon codes [12] and is 
generalized to ramp schemes in [4], [18], which have significantly better space efficiency, i.e., rate, than 
the original perfect scheme. Shamir’s scheme and the generalized ramp schemes achieve optimal usage of 
storage space, in the sense that fixing the size of the shares, the schemes store a secret of maximum size. 
The schemes are computationally efficient as decoding the secret is equivalent to polynomial interpolation. 
An example of Shamir’s ramp scheme is shown in Figure 1. Other threshold secret sharing schemes and 
generalizations of Shamir’s scheme may be found in [9], [19], [11], [10], The reader is also referred to 
[1] for an up-to-date survey on secret sharing. 


Party 1 

Party 2 

Party 3 

Party 4 

Party 5 

Party 6 

Party 7 

/(1) = 
mi + m 2 + k 

/(2) = 

mi + 2m.2 + 4 k 

/(3) = 

mi + 3m2 + 9k 

/(4) = 

mi + 4 m 2 + 5 k 

/(5) = 

mi + 5 m ,2 + 3 k 

/(6) = 

mi + 6 m -2 + 3 k 

/(7) = 

mi + 7rri2 + 5 k 


Fig. 1: Shamir’s scheme (ramp version) for n = 7, r = 4 , 2 = 1, with symbols over Fn. The scheme 
stores a secret of two symbols, denoted by mi, m 2 . Let k he a uniformly and independently distributed 
random variable. f(x) is the polynomial m\ +m 2 X + kx 2 . Note that the share stored by any single party 
is independent of the secret because it is padded by k, and that the secret can be decoded from the shares 
stored by any three parties by polynomial interpolation. 
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In addition to space and computational efficiency, this paper studies the communication efficiency for 
secret sharing schemes. Consider the scenario that a user wishes to decode the secret by downloading 
information from the parties that are available. Referring to the amount of information downloaded by 
the user as the decoding bandwidth , a natural question is to address the minimum decoding bandwidth 
that allows decoding. It is of practical interest to design secret sharing schemes that achieve a small 
decoding bandwidth, or in other words, that require communicating only a small amount of information 
during decoding. In such a case, decoding will be completed in a timely manner and the communication 
resource will be more efficiently utilized. 

In many existing secret sharing schemes, e.g., [14], [12], [9], [4], [18], [19], [11], [10], a common 
practice in decoding is that the user will communicate with a minimum set of parties, i.e., exactly n — r 
parties (even if d > n — r parties are available) and download the whole share stored by these parties. 
Wang and Wong [17] show that this paradigm is not optimal in terms of communication and that the 
decoding bandwidth can be reduced if the user downloads only part of the share from each of the d > n—r 
available parties. Specifically, given d, for any perfect threshold secret sharing scheme, [17] derive a lower 
bound on the decoding bandwidth when exactly d parties participate in decoding, and design a perfect 
scheme that achieves the lower bound. The field size of the scheme is slightly improved in [20], However, 
two interesting and important problems remain open: 1) the schemes in [17], [20] achieve the lower 
bound on decoding bandwidth when the number of available parties d equals a single specific value, and 
do not achieve the bound if d takes other values. This raises the question whether the lower bound is 
uniformly tight, or in other words, it is possible to design a single scheme that achieves the lower bound 
universally for all d in the range of [n — r,n\. 2) The results in [17], [20] target the case of prefect 
secret sharing schemes. It is well known that for any perfect scheme, the size of each share is as large 
as the size of the secret [15], [7], i.e., the rate of a perfect scheme is at most 1 /n. Any scheme with a 
higher rate is necessarily a (non-perfect) ramp scheme, which raises the question of how to generalize the 
results and ideas to non-perfect schemes. Both problems are of practical importance as the first problem 
addresses the flexibility of a scheme in terms of decoding, and the second problem addresses the high-rate 
case which is a typical requirement in many practical applications. In this paper we settle both problems 
and construct (perfect and ramp) schemes of flexible rate that achieve the optimal decoding bandwidth 
universally. Similar to Shamir’s scheme, our schemes are computationally efficient and have optimal space 
efficiency. 
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A. Motivating Example 


Party 1 

Party 2 

Party 3 

Party 4 

Party 5 

Party 6 

Party 7 

mi + m .2 + ki 

m 3 + m 4 + fc 2 

m .5 + me + k 3 

mi + 2 m 2 + 4fti 

m 3 + 2 m 4 + 4fc 2 

m .5 + 2me + 4/c 3 

mi + 3m 2 + 9fci 

m 3 + 3m 4 + 9 fc 2 

m .5 + 3m,6 + 9fc 3 

mi + 4m 2 + 5fci 

m 3 + 4m-4 + 5 fc 2 

ms + 4rti6 + 5fc 3 

mi + 5 m 2 + 3ki 

m 3 + 5 m 4 + 3 fc 2 

m-5 + 5m6 + 3k 3 

mi + 6m 2 + 3fci 

m 3 + 6 m 4 + 3 k 2 

ms + 6 m 6 + 3k 3 

mi + 7 m 2 + 5fci 

m 3 + 7m 4 + 5 fc 2 

ms + 7me + 5k 3 


(a) Shamir’s Scheme 


Party 1 


Party 7 

/(1) = ki + mi + m 2 + m 3 + m 4 + ms + me 
g( 1) = fc 2 + m 4 + m -5 + m 6 
h( 1 ) = k 3 + m 3 + m6 


/(7) = fci + 7mi + 5 m 2 + 2m 3 + 3rru + 10ms + 6m-6 
g(7) — fc 2 + 7m 4 + 5ms + 2m 3 
h(7) = k 3 + 7m 3 + 5me 


(b) Proposed Scheme 

Fig. 2: Two secret sharing schemes for n = 7, r = 4 and z = 1 over Fn. Both schemes store a secret of six symbols 
(m i, mo). In both schemes, fci,fc 2 ,fc 3 are i.i.d. uniformly distributed random variables. Scheme (a) is Shamir’s 
scheme (see Figure 1) repeated three times. In scheme (b), f(x) = ki+mix+rri 2 X 2 +m 3 x 3 + m 4 x 4 + rri 5 X 5 + m 3 x 6 , 
g(x) = k 2 + m 4 X + m 3 x 2 + m 3 x 3 , h(x) = k 3 + m 3 x + m 3 x 2 , and party i stores evaluations f(i), g(i) and h(i). 
Note that in (b), if all 7 parties are available, then the secret can be decoded by downloading only one symbol f(i) 
from each party i, and then interpolating f(x). If any 4 parties are available, then the secret can be decoded in the 
following way. Download two symbols f(i),g(i) from each available party i and first interpolate g(x ), implying 
that all coefficients of f(x) of degree larger than 3 are decoded. The remaining unknown part of f(x) is a degree- 
3 polynomial and so we have enough evaluations of f(x) to interpolate it, hence completely decoding the secret. 
Similarly, if any 3 parties are available, then the secret can be decoded in the following way. Download all three 
symbols f{i),g(i),h(i) from each available node i and interpolate h(x), which decodes the degree-3 coefficients 
of f(x) and g{x). Hence the remaining unknown part of g(x) is a degree-2 polynomial and can be interpolated, 
which decodes the coefficients of f(x) of degrees 4, 5,6. Hence the remaining unknown part of f(x) is a degree-2 
polynomial and can be interpolated, decoding the complete secret. This shows that the scheme meets the reliability 
requirement. In fact, for d = 3,4, 7, scheme (b) achieves the optimal decoding bandwidth when d parties participate 
in decoding. The secrecy of the scheme derives from the secrecy of Shamir’s scheme, as each polynomials /( x), 
g(x) and h(x) individually is an instance of Shamir’s scheme, and we show that combining them still meets the 
secrecy requirement. The construction is discussed in detail in Section IV. 

Consider Shamir’s ramp scheme in the example of Figure 1, that stores 2 symbols securely and reliably 
for the setting n = 7, r = 4 and z = 1. In order to decode the secret, a user needs to download 3 symbols 
from any 3 parties, and therefore the decoding bandwidth is 3 symbols. Now suppose the same scheme 
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is repeated 3 times in order to store a secret of 6 symbols, as shown in Figure 2a. Then to decode the 
secret, the decoding bandwidth is 9 symbols. 

We propose a new scheme in Figure 2b that also stores a secret of 6 symbols for the same setting, 
using the same amount of storage space, and over the same field size. In this scheme, if any 3 parties are 
available, then similar to Shamir’s scheme, the secret can be decoded from the 9 symbols stored by the 
three parties. However, if any 4 parties are available, then the secret can be decoded by downloading 2 
symbols from each available party. Therefore, the decoding bandwidth is improved to 8 symbols. If all 7 
parties are available, then the secret can be decoded by downloading only 1 symbol from each party and 
so the decoding bandwidth is further reduced to 7 symbols. 

We use the examples in Figure 2 to highlight several ideas to reduce the decoding bandwidth. Firstly, 
the amount of communication depends on the number of available parties. In fact the necessary amount of 
communication decreases strictly as the number of available parties increases. Secondly, it is important to 
distribute multiple subshares (symbols) to a party (essentially using the ideas of array codes [6], [5]). In 
contrast, Shamir’s scheme only distributes one symbol to each party except for trivial repetitions. Thirdly, 
during decoding it is not always necessary to download the complete share stored by a party. In general, 
a party can preprocess its share and the user can download a function of the share. 

Comparing to the schemes in [17], [20], the scheme in the example is improved and generalized in the 
following aspects. 1) The proposed scheme achieves the optimal bandwidth more flexibly. Specifically, 
the schemes in [17], [20] achieve the optimal bandwidth for a single specific number of available parties. 
The proposed scheme is more flexible as it can be designed to allow flexibility in the number of available 
parties d. In the example of Figure 2b the scheme achieves the optimal bandwidth when d = 3,4, 7. 
In general, we can construct schemes that achieve the optimal bandwidth for all n — r < d < n. 2) 
The proposed scheme is more flexible in rate. Specifically, the (perfect) schemes in [17], [20] have rate 
exactly 1/n. The proposed scheme in the example has rate 2/7 > 1/n = 1/7. In general, we can construct 
schemes of arbitrary rate. 

We also remark on an interesting analog between communication efficient secret sharing and the well- 
studied subject of regenerating codes [8], [16], [13], Consider a regenerating code of length n that is able 
to correct r > 1 erasures. If only one erasure occurs, then compared to repairing from a minimum set of 
n — r nodes, repairing from all the n — 1 available nodes will significantly reduce the total amount of 
communication that occurs during the repair. In this sense, for both regenerating codes and communication 
efficient secret sharing, a key idea is to involve more available nodes/parties than the minimum required 
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set during repair/decoding, for the purpose of reducing the repair/decoding bandwidth. 

B. Results 

In Section III, we prove a tight information-theoretic lower bound on the decoding bandwidth, given 
a set of available parties I C {1, ...,n}. The bound implies that the decoding bandwidth decreases as |/| 
increases. The lower bound applies to both perfect and ramp schemes and generalizes the lower bound in 
[17]. Particularly, we show that the overhead in communication for the case of |/| = n is only a fraction 
(n — r — z)/(n — z) of the communication overhead when |/| = n — r. 

In Section IV, we construct efficient secret sharing schemes using the ideas described in Section I-A. 
Our construction utilizes Shamir’s scheme and achieves the optimal decoding bandwidth universally for all 
I & A. Additionally, the construction preserves the simplicity of Shamir’s scheme and is efficient in terms 
of both space and computation. Specifically, the scheme achieves optimal space efficiency, and requires 
the same field size as Shamir’s scheme. Encoding and decoding the scheme is also similar to encoding 
and decoding Shamir’s scheme. The scheme shows that our lower bound in Section III is uniformly tight. 
Interestingly, the scheme also generalizes the construction in a recent independent work [2], However, 
the flexibility of our framework allows improved efficiency in terms of computation, decoding delay and 
partial decoding. 

In Section V, we construct another secret sharing scheme from Reed-Solomon codes. The scheme 
achieves the optimal decoding bandwidth when |/| = n and |/| = n r. The decoder of the scheme has a 
simpler structure compared to the decoder of the previous scheme, and therefore is advantageous in terms 
of implementation. The scheme also offers a stronger level of reliability in that it allows decoding even 
if more than r shares are partially lost. In Section VI we present a scheme from random linear codes that 
achieves the optimal decoding bandwidth universally. 

Finally, in the application of storage where each party is regarded as a disk, it is desirable to optimize 
the efficiency of disk operations. Our lower bound on the decoding bandwidth is naturally a lower bound 
on the number of symbol-reads from disks during decoding. In all of our schemes, the number of symbol- 
reads during decoding equals to the amount of communication. Therefore, our schemes are also optimal in 
terms of disk operations. In addition, by involving more than the minimum number of disks for decoding, 
our schemes balance the load at the disks and achieve a higher degree of parallelization. 
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II. Secret Sharing Schemes 

Consider the problem of storing a secret message m securely and reliably into n shares, so that 1) m 
can be recovered from any n — r shares, and 2) any 2 shares do not reveal any information about m, 
i.e., they are statistically independent. Such a scheme is called a threshold secret sharing scheme, defined 
formally as follows. Let Q be a general Q-ary alphabet, i.e., Q\ = Q. Denote by [n] = {1, ...,n}. For 
any index set / C [n] and a vector c = (ci, ...,c„), denote by cj = (ci)iei- 

Definition 1 . An (n,k,r, z)q secret sharing scheme consists of a randomized encoding function F that 
maps a secret m € Q k to c = (ci, ...,c„) = F(m) £ Q n , such that 

1) (Reliability) The secret m can be decoded from any n — r shares (entries) of c. This guarantees that 
m is recoverable in the loss of any r shares. Formally, 

H(m\ci)=0, V/C [n], \I\=n — r. (1) 

Therefore for any I C [n], |/| = n — r, there exists a decoding function D\ : Q n ~ r —> Q k such that 
DJ(c 7 ) = m. 

2) (Secrecy) Any z shares of c do not reveal any information about m. This guarantees that m is 
secure if any z shares are exposed to an eavesdropper. Formally, 

F[(m\ci) = Fl(m), V/ C [n], |/| = 2 . (2) 

Define the rate of a scheme to be k/n, which measures the space efficiency. The following proposition 
gives an upper bound on the rate. 

Proposition 1 . For any (n,k,r, z)q secret sharing scheme, it follows that 

k < n — r — z, (3) 

and so the rate of the scheme is at most n ~ r ~ z . 

J n 
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Proof: Let the message m be uniformly distributed, then 

k = H(m) = H(m\c[ z] ) (4) 

< H(m,c [n _ r] \c [z] ) 

= H(m\c[ n ^ r] ,c [z] ) + H(c[ n _ r ]\c[ z] ) (5) 

r ] |c[2:] ) (6) 

= H(C{ Z+ i,... jTl _ r }) <n-r-z, 

where (4) follows from the security requirement, (5) follows from the chain rule, and (6) follows from 
the reliability requirement. ■ 

A secret sharing scheme is rate-optimal if it achieves equality in (3). Note that the scheme is a perfect 
scheme if 2 = n — r — 1 and is a ramp scheme otherwise. Rate-optimal perfect secret sharing schemes 
are studied in the seminal work by Shamir [14], and are later generalized to ramp schemes [4], [18]. 
Note that by (3) the rate of any perfect scheme is at most 1/n as k = 1. Any scheme of a higher rate is 
necessarily a ramp scheme. 

III. Lower Bound on Communication Overhead 

Suppose that the n shares of the secret are stored by n parties or distributed storage nodes 1 , and a user 
wants to decode the secret. By Definition 1, the user can connect to any n — r nodes and download one 
share, i.e., one Q-ary symbol, from each node. Therefore, by communicating n — r symbols, the user 
can decode a secret of k < n — r — z symbols. It is clear that a communication overhead of 3 symbols 

occurs during decoding. The question is, whether it is possible to reduce the communication overhead. 

We answer this question affirmatively in the remaining part of the paper. 

There are two key ideas for improving the communication overhead. Firstly, in many practical scenarios 
and particularly in distributed storage systems, often time more than n — r nodes are available. In this case, 
it is not necessary to restrict the user to download from only — r nodes. Secondly, it is not necessary 
to download the complete share stored by the node. Instead, it may suffice to communicate only a part 
of the share or, in general, a function of the share. In other words, a node can preprocess its share before 
transmitting it to the user. 

1 In what follows we do not distinguish between parties and nodes. 
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Motivated by these ideas, for any / C [n], |/| > n — r, define a class of preprocessing functions 
E[ t : Q —> Si : i, where < |Q|, that maps Cj to e/,, = Eificf). Let e/ = (e/ ; j)j e j, and define a 

class of decoding functions Dj : Hie/ —► Q fc , such that Di(ej) = m. For a naive example, consider 
any I such that |/| = n r. Then for i £ I, we can let 67 , = Q, let E r j be the identity function, and 
let Di be the naive decoding function I)“ f described in Definition 1. In the remaining paper, when I is 
clear from the context, we will suppress it in the subscripts of Si^,Ei^, e j, and ej, and denote them 
by Si, Ei, ei and e instead. We now formally define the notion of communication overhead in decoding. 
Note that all log functions in the paper are base Q. 

Definition 2. For any I such that |/| > n — r, define the communication overhead function to be 
CO (I) = log — k. Namely, CO (I) is the amount of extra information, measured in Q-ary 

symbols, that one needs to communicate in order to decode a secret of k symbols, provided that the set 
of available shares is indexed by I. 


The following result provides a lower bound on the communication overhead function. It generalizes 
the lower bound in [17] for perfect schemes, i.e., schemes with k = 1. 

Theorem 1. For any (n, k, r, z)q secret sharing scheme with preprocessing functions {-E/,;}ie[n],|/|>n-r 
and decoding functions {Dj} |/|> n _ r , it follows that 

kz 


CO(I) > 


(7) 


\I\-z- 

Proof: Consider arbitrary I = {i i,..., t|/|} such that \I\>n — r. Assume without loss of generality 


functions. 


1^1 < 

... < <Sj m |. Recall that e/ = (e ii; 

...,ej |r| ) is the output of the preprocessing 

H{e il ,... 

(o) 

— 77(eji,..., Ci|jr|_3 |Ci|j|_ a+1 

> ■"> ei \i\ ) 


= Hie^ ,..., ei m _ z |ei| 7 |_ s+J 

i *6|/ ) T E(m\ e,.,,..., , ) 


if) TT, 1 

tt (TT!-, Ci 1 , j|_ 

i+ i) •••) e i|/| ) 


> H(m\e im _ z+ 1 ,...,ei^) 



W TJt \ 1 

= H(m) = A:, 

(8) 


where (a) follows from conditioning reduces entropy, (b) follows from (1), (c) follows form the chain 
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rule, and fd) follows from (2). Therefore it follows from (8) that 

m-z 

n 1^1 > Q J?(e<1 ’"‘ ,ei i / i- z) > Q k , 

i=i 

and so 

\I\-Z 

^2 \og\S i:j \ > k. (9) 

3 =1 

It then follows from 5,;, | < ... < l«5* m I that ’ 

k 

lo g|^m-J > |7pr^» 

and that, 

log|Si|ji_s+J >log|^»m-J > |J|^ ’ i = (10) 

Combining (9) and (10) we have, 

kz 

CO(/) = ^log|^.|-fc> . 

j=i 

■ 

The decoding bandwidth is defined to be the total amount of Q-ary symbols the user downloads from the 
nodes, which equals CO(I) + k. Theorem 1 suggests that the communication overhead and the decoding 
bandwidth decrease as the number of available nodes increases. 

For rate-optimal schemes. Theorem 1 implies that if |/| = n — r, then the communication overhead 
is at least z, i.e., the user needs to download the complete share from each available node. The naive 
decoding function D } in Definition 1 trivially achieves this bound. The more interesting scenario is the 
regime that |/| > n — r. In this case, if (7) is tight, then one can achieve a non-trivial improvement 
on decoding bandwidth compared to the naive decoder D\. When k - 1 (i.e., for perfect schemes) and 
fixing any d > n — r, [17] constructs a rate-optimal scheme that achieves the lower bound (7) for any I 
such that |/| = d. However, several interesting and important questions remain open. Firstly, is the lower 
bound uniformly tight, or in other words, is it possible to construct a scheme that achieves (7) universally 
for any I such that |/| > n — r (note that the scheme in [17] does not achieve the lower bound when 
|/| jk d )? Secondly, is the bound tight when k > 1 (i.e., for ramp schemes) and how to design such 
schemes? We answer these questions in the following section. 
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IV. Construction from Shamir’s scheme 

In this section we construct a rate-optimal scheme that achieves the optimal decoding bandwidth 
universally for all possible I, i.e., all sets of available nodes. This implies that the lower bound in 
Theorem 1 is uniformly tight. The scheme is based on Shamir’s scheme and preserves its simplicity and 
efficiency. The scheme is flexible in the parameters n, k, r, 2 and hence is flexible in rate. 

We first refer the readers to Figure 2b for an example of the scheme, and use it to describe the general 
idea of the construction. To construct a scheme that achieves the optimal decoding bandwidth when d 
nodes are available, for all d £ V, we design a set of polynomials of different degrees. Particularly, 
for all d £ V, we design a number of polynomials of degree exactly d — 1, and store one evaluation 
of each polynomial at each node. For each polynomial, exactly 2 of its coefficients are independent 
keys in order to meet the secrecy requirement. The remaining coefficients encode “information”: for the 
highest-degree (e.g., degree d max — 1, where cZ max = max^g-p d) polynomials, their coefficients encode 
the entire message; for other polynomials, say g(x ), the information encoded in the coefficients of g(x) 
is the high-degree coefficients of the polynomials of degree higher than g(x). Such an arrangement of 
the coefficients enables decoding in a successive manner. Consider decoding when d nodes are available, 
implying that d evaluations of each polynomial are known and hence all polynomials of degree d 1 
can be interpolated. Then, roughly speaking, the arrangement ensures that the high-degree coefficients of 
some higher-degree polynomials are known, so that the remaining unknown parts of these polynomials 
can be interpolated. This in turn allows to decode coefficients for additional high-degree polynomials 
and thus to interpolate them. The chain continues until all polynomials of degree higher than d — 1 are 
interpolated, implying that the message is decoded. Note that no polynomials of degree smaller than d— 1 
are interpolated, and therefore the keys associated with them are not decoded. This leads to the saving 
in decoding bandwidth and in fact this amount is the best one can expect to save, so that the scheme 
achieves the optimal bandwidth. Below we describe the scheme formally. 

A. Encoding 

Consider arbitrary parameters n, r, z, V and let k = n — r — z. We assume that n — r £ V since it 
is implied by the reliability requirement. Choose any prime power q > n, the scheme is F (/ -linear over 
share alphabet Q = F^, where b is the number of (F g ) symbols stored by each node. The message m is 
a vector over F g of length \m\ = kb. The choice of b is determined by V in the following way. Let \m\ 
be the least common multiple of {d — z : d £ V}, i.e., the smallest positive integer that is divisible by 
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all elements of the set. Note that indeed \m\ is a multiple of k = n — r — z, and we let b = ™'. This 
is the smallest choice of \m\ (and thus b ) that ensures when d £ V nodes are available, that the optimal 
bandwidth, measured by the number of F g symbols, is an integer. 

We now construct b polynomials over F, ; , evaluate each of them at n non-zero points, and let every 
node stores an evaluation of each polynomial. Let T> = {di, d^, •••> d|x>|}> such that n> d\ > d 2 > ■■■ > 


d\v\ = n — r. For i G \D\, let 


Pi = 


M 

di — z 


M 


i = 1 
i > 1 


( 11 ) 


di~z di — i—z 

We construct p, polynomials of degree di — 1. For all polynomials, their z lowest-degree coefficients are 
independent random keys. We next define the remaining di — z non-key coefficients. We first define them 
for the highest degree polynomials, and then recursively define them for the lower degree polynomials. 
For i = 1, the non-key coefficients of the polynomials of degree di — 1 are message symbols. Note 
that there are |m| message symbols and polynomials of degree d\ — 1. Each such polynomial has 
d\ — z non-key coefficients and so there are exactly enough coefficients to encode the message symbols. 
For i > 1, the non-key coefficients encode the degree di to di -1 — 1 coefficients of all higher (than 
di — 1) degree polynomials. Note that there are Pj = ^"*1. higher degree polynomials and so the 

total number of coefficients to encode is (di -1 — d,) d ' n ^ . On the other hand, there are p, polynomials 
of degree di — 1, each of them has di — z non-key coefficients, and so the total number of non-key 


coefficients is (di — z) 


m | 

—z 


M 

di — i — z 


It is trivial to verify that the two numbers are equal and so 
there is exactly enough coefficients to encode. Note that the specific way to map the coefficients is not 
important and any 1-1 mapping suffices. Finally, evaluate each polynomial at n non-zero points and store 
an evaluation of each polynomial at each node. This completes the scheme. Note that indeed the total 


number of polynomials is ][L=i Pi = ;l 1 '"1, = = b, implying that the scheme is rate-optimal. 


_ M _ M _ 


B. Decoding 

For any di € V, we describe the decoding algorithm of the scheme when di nodes are available. It 
achieves the optimal decoding bandwidth, and since d\x>\ = n — r it implies that the scheme meets the 
reliability requirement. We first interpolate all polynomials of degree di — 1. After that for all polynomials 
of degree d, - \ — 1, their coefficients of degree larger than di — 1 are known (as they are encoded in the 
coefficients of the polynomials of degree di — 1) and so they can be interpolated. In general, for 3 < i, 
once the polynomials of degree between dj — 1 and d, — 1 are interpolated, then for the polynomials of 
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degree dj_i — 1, their coefficients of degree larger than ci, - 1 are known by construction and so they 
can be interpolated. Therefore we can successively interpolate the polynomials of higher degree until the 
polynomials of degree d\ — \ are interpolated and so the message symbols are decoded. The total number 
of F, symbols communicated is di^2‘- =1 Pj = By Theorem 1, the decoding bandwidth is at least 


m + 


kbz 
di — z 


= kb + 


kbz 
di — z 


= kb 1 + 


di — z 


dj\m | 

di — z 


F 9 symbols. Therefore the optimal bandwidth is achieved. 


C. Secrecy 

We show that the scheme is secure against z eavesdropping nodes. Since each polynomial individually 
is a Shamir’s scheme, the secrecy of the scheme derives from the secrecy of Shamir’s scheme. The main 
idea is to show that if these polynomials are combined, the resulting scheme is still secure. We first prove 
a simple lemma. 

Lemma 1. Consider random variables Mi, M 2 , K\, K 2 such that K 2 is independent of {M-\, K \}. 
For i = 1,2 Let Fi be a deterministic function of Mi, K : . If I(Mi;Fi) = 0 and I(AI 2 ;F 2 ) = 0, then 
I (M-|; Fi, F 2 ) = 0. In addition, if Ki is independent of M 2 , then I(A'f, M 2 \ F-[, F 2 ) = 0. 

Proof: We start with the first statement. Since F 2 is a function of K>, M 2 but K 2 is independent 
of {Mi, Ki, Fi}, it follows that F 2 is independent of {Mi, K\, Fi} conditioning on M 2 , implying 
the Markov chain {Mi,Ki,F{\ —> M 2 —> F 2 . Therefore, I{Mi, Ki, Fi, M 2 ', F 2 ) = = 0, 

i.e., F 2 and {Mi, Ki, Fi, M 2 } are independent. Hence I(Mi; Fi, F 2 ) = I(Mi;F 2 ) + I(Mi;Fi\F 2 ) 
/(Af-|: F-[ \F 2 ) = I(M - t ; F t ) = 0, where (a) and (b) follows from the fact that F 2 is independent from 
{Mi, Fi}. 

To prove the second statement, note that since K\ is independent of M 2 and that I\ is a function 
of Mi,Ki, we have the Markov Chain M 2 —> Mi —> Fi, by which it follows that I(M -[, M 2 ; F \) = 
I(M -\; Ff) = 0. Similarly because K 2 is independent of {Mi,Ki,Ff\ and that F 2 is a function of M 2 , A" 2 , 
we have the Markov Chain {Af-|, F { } —> M 2 —>• F 2 . By this chain it follows that I ( M -\, F\, M 2 ] F 2 ) = 
I(M 2 \F 2 ) = 0, i.e., {All, Fi, AI 2 } is independent of F 2 . Therefore I(Mi, AL 2 ; F 2 \Fi) = 0 and so 
I(Mi,M 2 ; Fi,F 2 ) = I(Mi,M 2 ;Fi) + I(M U M 2 ; F 2 \Fi) = 0. ■ 

Suppose that the adversary compromises z nodes and obtains z evaluations of each polynomial. Consider 
the i-th polynomial in the order that we define them, let f , denote the adversary’s observation of this 
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polynomial, let k, denote the key coefficients of this polynomial and let rn, denote the non-key coefficients. 
The secrecy of Shamir’s scheme implies that 

= 0, i=l,...,b. (12) 

Consider the first pi polynomials which are polynomials of the highest degree d\ - 1. By construction, 
m i, m pi exactly encode the message m. We invoke Lemma 1 by regarding mi, k t , /|, m 2 , k 2 and f 2 
as Mi, A'i, f), M 2 , A '2 and F 2 . By the second statement of the lemma it follows that /(mi, m 2 ; / 1 , / 2 ) = 
0. Inductively, for 1 < i < pq, suppose that J(m 1 ,...,mj;/ 1 ,...,/j) = 0. We regard {m 1 ,...,mi} as 
Mi, {fci,,.., fcj} as A'i, {/i,„.,/»} as A), and regard m. i+1 ,k i+1 , f i+1 as M 2 ,K 2 ,F 2 . It follows from 
Lemma 1 that /(mi,m*+i; / i+ l)=0. By induction we have /(m^, m Pl ; / 1 ,..., f Pl ) = 0. 

We then regard {m l7 m pi } = m as M 1; {fci,..., fc pi } as A'i, ,.,., f pi } as A x , and regard m pi+1 , 
fc pi _l_i, / Pl + 1 as M 2 , AT 2 , A 2 . Then it follows from the first statement of Lemma 1 that /(m; / 1 ,/ Pl +i) = 
0. Inductively, for pi < i < b, suppose that /(m; / 1 ,/,) = 0. We regard m as Mi, { k t ,fc,} as A'i, 
{/ 1 , •••• /»} as F lt and regard m !+ i,fc i+ i, /,:+i as M 2 , A' 2 , A 2 . By Lemma 1 we have J(m; / 1 ,/; + i) = 
0. By induction it follows that / (m: f t ..... /),) = 0, implying that the adversary learns no information 
about the message m. This completes the proof and we have the following theorem. 

Theorem 2. Let T> C {n — r,n — r + l,...,n}, f/ze encoding scheme constructed in Section IV-A is 
a rate-optimal (n, k 1 r, z ) secret sharing scheme. The scheme achieves the optimal decoding bandwidth 
when d nodes participate in decoding, universally for all d € T>. 

D. Discussion 

We remark on some other important advantages and properties of our construction. Firstly, the scheme 
also achieves the optimal number of symbol-reads from disks in decoding. To see this, notice that the 
lower bound (7) on communication overhead is also a lower bound on the number of Q-ary symbols that 
need to be read from disks during decoding. The number of symbol-reads in the proposed scheme equals 
to the amount of communication. Therefore our scheme achieves the lower bound and hence is optimal. 
Secondly, compared to most existing schemes which decode from the minimum number of n — r — z 
nodes, our scheme allows all available nodes (or more flexibly, any d £ V nodes) to participate in 
decoding and hence can help balance the load at the disks and achieves a higher degree of parallelization. 
Thirdly, the encoding and decoding of the scheme are similar to that of Shamir’s scheme and therefore are 
efficient and practical. Particularly, the scheme works over the same field as Shamir’s scheme. Fourthly, 
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the preprocessing functions only rely on d = |/| instead of /, further simplifying implementation. Finally, 
the construction is flexible in the parameters, i.e., it works for arbitrary values of n, r and z and V. 

An important idea in our scheme is to construct polynomials of different degrees in order to facilitate 
decoding when different number of nodes are available. Similar ideas also appear in the schemes in [17], 
[20], The main technique that enables the improvement of our schemes is a more careful and flexible 
design of the numbers and degrees of the polynomials, as well as the arrangement of their coefficients. 

Our scheme maps the high-degree coefficients of the higher degree polynomials into the coefficients 
of the lower degree polynomials, whereas the specific mapping is not important and any 1-1 mapping 
suffices. In practice, the flexibility in choosing the specific mapping is helpful. Particularly, it is possible 
to improve the (computational) encoding complexity of the scheme substantially by choosing a mapping 
that maintains the order of the coefficients. Refer to Figure 2b for an example. We need to compute 
m^x + m^x 2 +mex 3 in evaluating g(x), and we can reuse this computation in evaluating f(x), because 
f(x) contains the same run of consecutive coefficients m^x 4 + m^x 5 +m,QX 6 . This for example will save 
2 multiplications and 2 additions. 

We also note that for all polynomials in our scheme, the z lowest degree coefficients are independent 
keys. However, in general this is not necessary: in any polynomial, we can choose any consecutive z 
coefficients to be independent keys, and use the remaining coefficients to encode information (i.e., message 
symbols and coefficients of higher degree polynomials). The resulting scheme is a still valid and achieves 
the optimal decoding bandwidth universally. Under this observation, we note that our scheme generalizes 
the scheme in a recent independent work [2], Particularly, our scheme is equivalent to the scheme in [2] 
if we require a specific coefficient mapping and let the z highest (instead of lowest) coefficients of all 
polynomial to be keys 2 . 

As noted above, the flexibility of our scheme in choosing the coefficient mapping is beneficial in 
practice. Furthermore, we remark that choosing the lowest degree coefficients to be keys has several 
practical advantages: decoding the scheme involves sequentially interpolating the polynomials through 
multiple iterations, which can lead to undesirable delay especially when \V\ is large. To mitigate this issue, 
we wish to decode the message symbols “on the fly” in each iteration. Specifically, if d nodes are available, 
then each time a polynomial is interpolated, exactly d new message and/or key symbols are decoded. Since 
the number of symbols decoded in each interpolation, the total number of message symbols and the total 

2 The scheme in [2] also lets a node evaluate all polynomials at the same point, whereas this is not necessary in our framework. 
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number of key symbols to be decoded are all fixed, there is a trade-off between the decoding order of the 
key and message symbols. The optimal trade-off is to delay decoding the keys as much as possible, so that 
the maximum number of message symbols are decoded on the fly. Specifically, notice that by the time that 
a number of i polynomials are interpolated, at least zi key symbols are decoded since each polynomial 
introduces z independent key coefficients for secrecy. The optimal trade-off is achieved if indeed exactly 
zi keys are decoded, implying that (d — z)i message symbols are decoded. Our scheme achieves this 
optimal trade-off by choosing the z lowest degree coefficients to be keys. This is because by construction, 
only coefficients of degree higher than d\x>\ = n — r > z will be mapped to the coefficients of the lower 
degree polynomials. Hence the key coefficients are never mapped, implying that the remaining information 
coefficients encode only message symbols. Therefore, at any moment during the decoding process, our 
scheme always decodes the maximum number of message symbols. In other words the decoding delay, 
measured in the number of iterations, averaged over all message symbols, is minimized. Moreover, the 
fact that each polynomial interpolation decodes a fixed number of d — z new message symbols is helpful 
for implementation. On the other hand, note that choosing the z highest degree coefficients to be keys 
implies that the keys will be mapped to the coefficients of lower degree polynomials. Hence the keys will 
be decoded earlier than necessary (since lower degree polynomials are interpolated earlier) and it is not 
possible to achieve the optimal trade-off. Consider the example in Figure 2b, if we switch the keys to high 
degree coefficients, then the polynomials are f(x) = mi + m 2 X + m 3 x 2 + ni 4 X :i + m$x 4 + m^x 5 + kix 6 , 
g(x) = m 5 + m 6 x + k\X 2 + k 2 x z and h(x) = m 4 + k 2 x + k 3 x 2 . In the case that d = 4 nodes are 
available, only 2 message symbols 7715 , me are decoded in the first iteration and the remaining 4 message 
symbols are decoded in the second (last) iteration. In comparison, the original scheme performs better by 
decoding 3 message symbols in each iteration. Finally, we remark that decoding the maximum number 
of message symbols on the fly is also beneficial in terms of partial decoding, i.e., decoding a subset of 
message symbols. In this case decoding can finish early if all symbols of interest are decoded, and our 
scheme will maximize the chance of finishing early. 

V. Construction from Reed-Solomon Codes 

In this section we present another rate-optimal secret sharing scheme that achieves the optimal decoding 
bandwidth when all n nodes are available. The scheme is flexible in the parameters and hence is flexible 
in rate. The scheme is directly related to Reed-Solomon codes. Particularly, the encoding matrix of the 
scheme is a generator matrix of Reed-Solomon codes, and so the scheme can be decoded as Reed-Solomon 
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codes. This is an advantage over the scheme in the previous section, which requires recursive decoding. 
The scheme also provides a stronger level of reliability in the sense that it allows decoding even if more 
than r shares are partially erased. On the other hand, unlike the previous scheme, this scheme does not 
achieve the optimal decoding bandwidth universally, but rather only for d = n — r and d = n. However, 
we remark that the case that the n nodes are available is particularly important because it correspond to 
the best case in terms of decoding bandwidth and is arguably the most relevant case for the application 
of distributed storage, where the storage nodes are usually highly available. 

A. Encoding 

Fix k = n — r — z, let q > ?r(fc + r) be a prime power, and let the share alphabet be Q = F^' +r . Note that 
each share is a length k+r vector over F g . For j = 1,..., n, denote the j-th share by Cj = (cij,..., Ck+ r j), 
where Cjj £ F, y . The secret message m is k symbols over Q and therefore can be regarded as a length- 
k(k+r) vector over F g , denoted by (mi,..., rn^k+r))- The encoder generates keys k = (fei,..., kk z ) G F^’ z 
and k' = (. k[ ,..., k' rz ) £ W q z independently and uniformly at random. The encoding scheme is linear over 
F g , and is described by an encoding matrix G over F g : 

(^1,1) •**) Cl,m Cfc+r,l; Ck+r,n) = ***5 T'k(k+r)i ^1; •••? ^kz 5 k[,...,k' rz )G. (13) 

Note that G has k(k+r)+kz+rz = nk+rz rows and has ?z(fc+r) columns. In the following we discuss 
the construction of G based on a Vandermonde matrix. We start with some notation. Let a±,a n (k+r) 
be distinct non-zero elements of F g , and let Vij = a*- -1 , i = 1, ...,nk + rz, j = 1, ...,n(k + r), then 
V = ( Vij) is a Vandermonde matrix of the same size as G. Suppose / = (/o> fi) is an arbitrary vector 
with entries in F g , we denote by f[x] the polynomial /o + f\X + ... + ftx 1 over F g with indeterminate 
x. We construct a set of polynomials as follows: 


fi[x] = X 1 1 

i = 1 ,..., kn, 

( 14 ) 

kn 

fkn-\-i[%\ — 11 (*£ ) 

3 =1 

i = 1 ,..., rz. 

( 15 ) 


Let fi,i = 1,..., kn + rz be the length -(kn + rz) vectors over F g corresponding to the polynomials. Stack 
the fi s to obtain a sqaure matrix of size (fcn + rz): 
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Finally, we complete the construction by setting 

G = TV. 


Example 1. Consider the setting that n = 3,r = 1, 2 = 1 and k = n — r — z = 1. Let q = 7 and Q = F^. 
Then m = (mi,7n 2 ), k = (fci) and k! = {k[). Construct a Vandermonde matrix over F g as 

111111 
1 2 3 4 5 6 
1 4 2 2 4 1 
116 16 6 

Construct polynomials fi [x] = 1, /2 [x] = x, fs [x\ = x 2 and 

filx] = (x — l)(a: — 2)(x — 3) = 1 + 4x + x 2 + x 3 . 





Lemma 2. Regard G as a block matrix 

G n G 12 

G-n G22 

where G\\ has size kn x kn, G\ 2 has size kn x rn, G 21 has size rz x kn, and G 22 has size rz x rn. 
Then, 

(i) Any (n — r)(k + r) columns of G are linearly independent. 

(ii) G 11 is a Vandermonde matrix. 
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(iii) G 21 = 0. 

(iv) Any rz columns of G 22 ore linearly independent. 

Proof: By construction, the polynomials fi[x\,i - 1,.... kn + rz have distinct degrees and therefore 
are linearly independent. Therefore the rows of T are linearly independent and so T is full rank. This 
implies that the row space of G is the same as the row space of V. The row space of V is a linear 
( nk+nr , nk+rz ) MDS code 3 because that V is a Vandermonde matrix. Note that nk+rz = (n—r)(k+r), 
and so the row space of G is a linear (nk + nr, (n — r)(k + r)) MDS code. This proves (i). 

To prove (ii), note that by (14), the first kn rows of G are exactly the first kn rows of V. Therefore 
Gu is a Vandermonde matrix. 

To prove (iii), note that by construction the (i,j)-th entry of G 21 equals fk n +i[otj}- By (15), ay is a 
root of fkn+i[x], for i = 1, ...,rz, j = 1 ,...,kn. Hence G 2 i = 0. 

Finally we prove (iv). By construction the (*, j)-th entry of G 22 equals 

kn 

fkn+i[otkn+j\ = a kn+j — a l) = a kn+jf [ a kn+j ]> ( 17 ) 

1=1 

where f*[x] = nf=i( a: — a i )■ Since ay,..., ct(k+r)n are distinct elements, it follows that f*[ctkn+j ] 7 ^ 0, 
for j = 1, ...,rn. Let L < j-\ < j 2 < ... < j rz < rn and consider the submatrix formed by the ji-th,...,j rz - 
th columns of G 2 2 - By (17), the Z-th column of the submatrix are formed by consecutive powers of akn+ji, 
scaled by f*[a kn +ji ]• Therefore the determinant of the submatrix is Y[\Li f*[^kn+j,\Y[i< u<v <r Z { a kn+ 3 V - 
ak n +j u ) f 0. This shows that any rz columns of G 22 are linearly independent. ■ 

B. Decoding 

We describe the decoding procedure for two cases: 1) |/| = n, i.e., all nodes are available, and 2) 
|/| < n. First consider the case that |/| = n, i.e., I = \n]. In order to decode, for this case it suffices 
to read and communicate the first k symbols over ¥, t from each share. Formally, the user downloads 
e = (cii,..., Ci „,..., Cfcy, Ck } n). By Lemma 2.(ii), Gu is invertible. Denote the inverse of Gy by 
Gu, then the secret can be recovered by 

1 (c) 

&Gil = (??7-i, 777-^(^_(_ r ) , &i, kkz), 


3 In fact this is the Reed-Solomon code. 
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where (e) follows from (13) and Lemma 2.(in). The decoding process involves communicating kn symbols 
from Fq. The communication overhead is kz symbols over ¥ q or jrff = Q-ary symbols, which 
achieves the lower bound (7) and therefore is optimal. 

Next consider the case that n — r < |/| < n. Select an arbitrary subset /' of I of size n — r, and 
download the complete share stored by the nodes in I'. Hence, the downloaded information e is a length- 
(n — r)(k + r) vector over F 9 . By Lemma 2,(i), it follows that any (n — r)(k + r ) columns in G are 
linearly independent and therefore the submatrix formed by these columns is invertible. The secret m can 
then be recovered by multiplying e with the inverse. An alternative way to decode the secret is to notice 
that G is an encoding matrix of a ( nk + nr,nk + rz) Reed-Solomon code over F g . Therefore one may 
employ the standard decoder of Reed-Solomon code to correct any r(k+r) erasures or \ r(k+r)/ 2J errors 
of symbols over F 9 . Note that when at most r nodes are unavailable , we regard their shares as erased 
and there are at most r(k + r) erasures of symbols over F g , and therefore can be corrected. In general, 
any r(k + r) erasures or [r(k + r) /2J errors are correctable even if they occur to more than r nodes. 
The decoding process involves communicating nk + rz symbols of F 9 . The communication overhead is 
(n — r)(k + r) — k(k + r) = z{k + r) symbols over F g , or 2 symbols over Q, which achieves the lower 
bound (7) if and only if |/| = n — r. 

C. Analysis 

Theorem 3. The encoding scheme constructed in Section V-A is a rate-optimal (n, k , r, z) secret sharing 
scheme. The scheme achieves the optimal decoding bandwidth when d nodes participate in decoding, for 
d = nord = n — r. 

Proof: We need to verify that the encoding scheme meets the reliability requirement and the security 
requirement of a secret sharing scheme, formally defined in Definition 1. Explicit decoding scheme and its 
communication overhead are discussed in Section V-B and therefore the reliability requirement is met. The 
scheme is rate-optimal because k = n — r — z. We only need to show that the encoding scheme is secure. 
To this end, we first show that H{k, k'\cj, m) = 0, for all I such that |/| = 2 . In other words, the random 
symbols generated by the encoder are completely determined by cj and the secret. Denote the submatrix 
formed by the first k(k + r) rows of G by G top and the submatrix formed by the remaining ( k + r)z rows 
of G by Gi ow . Consider any I = and let c/ = (a til ,c 1>iz ,..., c fc+riil , c k+r ,i J. It then 
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follows from (13) that 

C/ — (mi, . Tklk(r-\-k) )Gtop .1 T (fcl 5 • •• i fcfcz > fcl1 ■ • ■ j ^rz )^ T low,/ j 

where G top ./ is the submatrix formed by the subset of columns in {i+j\i £ I,j = 0,n, (k + r — l)?r} 
of G top . and Gi ow ,/ is the submatrix formed by the same subset of columns of Gi ow . Therefore, written 
concisely, 

(fc k')G lowJ =ci - mGtop,/. (18) 


To study the rank of Gi ow ./, note that it is a square matrix of size (fc + r)z, and we regard it as a block 
matrix 


^low .1 — 


G' n G[, 

G / S~v/ 

91 'GXOf. 


where G' n has size kz x kz, G' 12 has size kz x rz, G 21 has size rz x kz and G 22 has size rz x rz. 
By Lemma 2.(ii), G' n is a block of a Vandermonde matrix and therefore is invertible. By Lemma 2.(iii), 
G 21 = 0. Denote c/ — mG top j by (ui, then the above two facts together with (18) imply 


k=(u 1 ,...,u kz )G' 1 - 1 1 (20) 

Therefore fc is a deterministic function of m and cj. It follows from (18) that 

^ G22 ~ ('U'fcz+li •••> 'U‘(k+r)z) ^^ 12 ' 

By Lemma 2.(iv), G 22 is invertible and therefore 

fc ((ttfcz+l, ■■■i u (k+r)z) ~ ^^12) G22 ■ ( 21 ) 

This shows that fc' is a deterministic function of fc, cj and m, and so 

H(k, fc'|c/, m) = 0. (22) 
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It then follows that, 

H(m) — H(rn\cj) = I(m ; cj) 

= H(a) - H(d\m) 

(/) 

< z — H(cj\m) 

=■* x — H(cj\m) + H(cx\m , fc, fc') 

= z — I{cj ; fc, fc'|m) 

= z — H(k, k'\m) + H(k,k'\ci, m) 

== z — H(k, k'\m) 

= 2 - JTflfe, fc') 

(?) 

= 2 - 2 = 0, (23) 

where (f) is due to |I| = 2 ; (g) is due to the fact that cj is a function of m, k and A:'; (h) is due to 

(22); (i) is due to the fact that k, k' are independent of m; and (j) follows from the fact that k. k' are 

uniformly distributed. Therefore H(m) = II(rn\c[) and the security requirement is met. This completes 

the proof that the encoding scheme is a valid secret sharing scheme. ■ 

Theorem 3 shows that the proposed secret sharing scheme is optimal in terms of storage usage and 

is optimal in terms of best-case (i.e., |/| = n) communication overhead. Compared to the scheme in the 

previous section, this scheme has advantages in terms of implementation and error correction because 

decoding the scheme is equivalent to decoding standard Reed-Solomon codes. The scheme also provides 

a stronger level of reliability in the sense that it allows decoding even if more than r shares are partially 

erased. Similar to previous discussion, the scheme achieves the optimal number of symbol-reads from disks 

when |/| = n. Finally, in the scheme all operations are performed over the field F 9 , where q > n(k + r). 

This requirement on the field size can be relaxed in the following simple way. Let /3 be the greatest common 

divisor of k and r, then instead of choosing Q to be F{; +r , we can let Q = Fo 13 , m = (mi ,..., m k(k + r )), 

q 0 

k = (A’i,..., kkz) and k' = (k[, ..., k'rz ). The resulting scheme is a rate-optimal (n, k,r, z)q secret 
sharing scheme with the same communication overhead function as the original scheme. For this modified 
construction, it is sufficient to choose any field size q > 
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VI. Secret Sharing Schemes from Random Codes 

In this section we describe a rate-optimal (perfect or ramp) secret sharing scheme based on random 
linear codes that achieves the optimal decoding bandwidth universally. The scheme meets the secrecy 
requirement deterministically, and meets the reliability requirement with high probability as the field size 
grows. 

A. Encoding 

Let k = n — r — z, q be a prime power, and let N be the least common multiple of {n — z — r, ,n — 
z — r + 1, ...,n — z}. Set Q = ¥^ k+> \ Therefore each share of the secret is a length N(k + r) vector 
over F g . For j = 1, ...,n, denote the j-th share by Cj = (cij, cjv(fc+r),j)> where £ F g . The secret 
m consists of k symbols over Q and is regarded as a length-)V/c(/c + r) vector over F g , denoted by 
(mi,..., mjvfc(fe+r))- The encoder generates uniformly distributed random vectors k = (hi ,..., fcjvfcz) £ 
F^ fez and k' = (k [,..., k' Nrz ) £ F^ rz , independently from m. The encoding scheme is described by a 
set of N(kn + rz) x n encoding matrices Gi, i = 1,..., N(k + r) over F g , such that 

= (m k k')Gi, i = 1,..., N{k + r). (24) 

Intuitively, if the c UiV ’s are arranged into a matrix, then Gi is the encoding matrix for the 1-th row. We 
next describe the construction of the Gi matrices. For i = 1,..., Nk, let the first Nk(k + r) rows of Gi 
be a random matrix, let the next Nkz rows of G, be a Vandermonde matrix, and let the remaining Nrz 
rows of Gi be zero. Formally, for i = 1, ...,Nk, 



where f?,; £ F g is a random matrix with entries i.i.d. uniformly distributed over F g , and V z £ 

jpNkzxn j s a Vandermonde matrix, i.e., the (u, t;)-th entry of Vj equals Here a VjZ are distinct 

non-zero elements of F g , for 1 = 1,..., N(k + r), and v = 1, ...,n. 

For 1=1,..., Nr, let the first Nkn + (i — l)z rows of Gjvfe+i be a random matrix, let the next z rows 
of Gjvfc+t he a Vandermonde matrix, and let the remaining {Nr — i)z rows of GNk+i be zero. Formally, 
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for i = 1, Nr, 


GNk+i — 


R-Nk+i 

VNk+i 

0 


(26) 


where f?jvfc+! S IF 9 is a random matrix with entries i.i.d. uniformly distributed over F g , and 

V)vfc+i G F~ xn is a Vandermonde matrix, i.e., the (u, i>)-th entry of V^k+i equals oif~^ k+i . This completes 
the encoding scheme. The structure of the whole encoding matrix (Gi,G.v(/■+,.)) is illustrated in Figure 


3. 


< -Nrn-► 

n n n 

< - Nkn- ><><> ••• o 



Fig. 3: Blockwise structure of the matrix (Gi,..., Gjv(fc+ r ))- Blocks of random matrices are labelled by 
R, blocks of Vandermonde matrices are labelled by V, and blocks of zero matrices are labelled by 0. 


The following result shows that the scheme meets the security requirement deterministically, due to the 
Vandermonde matrices embedded in the Gf s. 

Theorem 4 . The encoding scheme constructed in this section is secure, i.e., H(m\cj) = I!(rn), for all 
I such that |/| = 2 . 
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Proof: Consider any / such that |/| = z. As in Theorem 3, we first show that II (k. k'\cj, m) = 0. 
Denote by G, /, Il,i and V, j the submatrix formed by the set of columns in / of G,, Ri and V), 
respectively. Let 

Gi^>Nk,i = (Gij ... Gjvfe,/) 

Rl^-NkJ = {RiJ ■■■ RNkj) 

Vi-iNk,I = (Vl,I VNkj)- 

Denote for short c, = then by (24), it follows that 

Rl^Nk,I 
Vl -^Nk,I 

0 

= (ci, c Nk)- 

Notice that Vi^Nkj is a Nkz x Nkz square Vandermonde matrix. Therefore it is invertible and 

k = ((Cl, ..., CATfc) , mRi—iNk,l') ^ 1 —>Nk,I' 

Hence H(k\cj,m) = 0. Then by (24), it follows that 

RNk+lJ 
VNk+ 1,1 

0 

Notice that Vivfc+ 1,1 is a 2 x 2 square Vandermonde matrix. Therefore it is invertible and 

(*!,...,**) = ( c Affc+i — ( m k)Rj^k+i,i)V NkJrl j. 

Hence H(k[, ..., k[\k. cj, m) = 0. Similarly, we can show that for i = 1,..., Nr 

H(k[i_ i )2+ i, k' iz \k[,k[i_ t)z ,k, c I ,m)=0. 

Therefore by the chain rule. 

Nr 

H(k, fe'|c/, m) = H(k\cj, m) + H{k[i -1 )z+i> •••• kiz | k' 1 ,...,k[ i _ 1)z ,k,c I ,m) 

*=1 

= 0. (27) 

Provided that (27) is true, we can then follow exactly the same argument as (23) in the proof of Theorem 
3, to show that II(m\cf) = H(m). This completes the proof. ■ 
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B. Decoding 

We describe the decoding scheme for any / such that \I\>n — r. Let 

-* **<"-I'D , ( 28 ) 

|/| - 2 ( 

then note that d is an integer because |/| — z divides N and that d is the solution to the equation 
(Nk + d)\I\ = Nkn + dz. In order to decode, it suffices to read and communicate the first Nk + d 
symbols over F, ; from each available share. Intuitively, by reading the first Nk + d symbols from each 
available share, we have a system of (Nk + d)\I\ equations. On the other hand, the variables involved 
in these equations are mi, ..., m.Nk(k+r)i &i, • fcjvfcz and k[ ,..., k' dz , i.e., the total number of variables is 
Nkn + dz. Because d is the solution to (Nk + d)\I\ = Nkn + dz, the number of equations in the system 
equals the number of variables, and is uniquely solvable if the equations are linearly independent. 

Formally, let Si = ¥^ k+d , and let Ei(cf) = (ci d , Cjvfc+d,<)- Denote for short that q = (ci t j)j^i, and 
denote the submatrix formed by the set of columns in I of Gj by G, /. Then it follows from (24) that, 

e/ = (ci,Cjvfc+d) = (m k k')(Gij ,..., G Nk+d j). 


By construction (25) and (26), the last (Nr — d)z rows of the matrices G\j ,..., Gjvfe+d,/ are all zeros. 
Therefore we may delete the last (Nr—d)z rows from G\ j ,..., GNk+d.i and denote by (G* r ,..., G* Nk+d T ) 
the corresponding trimmed matrix. It then follows that. 


e/ = (ci,..., c Nk+d ) = (m k k [,..., kf dz )(G * hI ,..., G* Nk+d I ). 

It is now evident that if the matrix (G) ,..... G* Nk+d 7 ) has full row rank, then it is right invertible and 
the secret can be recovered. The following result shows that the matrix indeed has full row rank with 
high probability. 


Theorem 5. For any I such that | J| > n — r, (G 7 7 ,..., G* Nk+d r ) has full row rank with probability at 
least 1 — TfZi’ over ^ le distribution of the random matrices R\,Rj\rk+d- 

Proof: Note that (GJ 7 ,..., G* Nk+d 7 ) has size (Nkn + dz) x (Nk + d)\I\. By the definition of d, 
it follows that Nkn + dz = (Nk + d)\I\ and therefore the matrix is square. Hence it suffices to show 
the matrix has full column rank and in the following we show the columns of the matrix are linearly 
independent with high probability. 

The first Nkz columns of (G* 7 ,..., G* Nk 7 ) are linearly independent because by (25), the (Nkn—Nkz+ 
l)-th row to the Nfor-th row form a Vandermonde matrix. Denote the z-th column of (GJ 7 ,..., G* Nk+d 7 ) 
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by gi • We first study the probability that the gNkz+i is i n the linear span of all the previous columns, i.e., 
spai%i, ...,g N kz+i-i\* for i = l,...,Nk(\I\ - z). Consider the sum of vectors g* = li9i- 

Fixing to be arbitrary values in ¥ q , then there is a unique tuple ( 7 such that g* 

agrees with gNKz+i in the (Nkn — Nkz + l)-th to Nkn-th entries. Therefore there are g i_1 different 
ways to linearly combine < 71 , gNkz+i-i, such that in the resulting sum vector, the (Nkn — Nkz+ l)-th 
to Nkn- th entries are equal to the corresponding entries of gNkz+i- Because the first Nk(n — z) entries 
of gNkz+i are i.i.d. uniformly distributed, it follows that 

qi—i 

p r{g N kz+i e span [gi, g Nk z+i-i\} < qNk{n _ z y i = l,...,Nk(\I\-z) (29) 

We next study the probability that gNk\i\+i is in spanLgq, ■■■, gNk\i\+i-i]- Consider arbitrary Nk < 
j < Nk + d — 1. By construction (26), gj\i\+% ^ span[gi, gj\j\ + i_i], for 1 < i < z, due to the 
Vandermonde matrix V~Nk+j■ Now consider gj\i\+i with z + 1 < i < |/|. There are q° l J l +4 z 1 different 
ways to linearly combine g 1 , gj\i\+i~i, such that in the resulting sum vector, the (Nkn+(j—Nk)z+1)- 
th to (Nkn+ (j — Nk)z + z)-th entries are equal to the corresponding entries in gj\i\+i- Note that the first 
Nkn + ( j — Nk)z entries of gj\i\+i are i.i.d. uniformly distributed. Therefore, for Nk < j < Nk + d — 1 
and z + 1 < i < |/|, it follows that 

Pr{9j\i\+i € span[ 9l ,.... g m +i-i]} < qNkn+(j _ Nk)z ■ (30) 
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Hence, by the union bound 


Pr{{Gli,...,G* Nk+dJ ) singular} 

Nk(\I\-z) ATfc+d-l |/[ 

< E Pl '{9Nkz+i 1-d. 2 } + E Pl i9j\i\+i i-d-} 


j—Nk i=z -\-1 


q i -1 Nk +*~ l ^ ^l/l+i—l 

/ n Nk(n—z) ^ ^ 




QNk(n—z ) / -/ / QNkn-\-(j—Nk)z 

i =1 ^ j=7V/c 2=2+1 ^ 

ATfe(|/|-n)-l ATfc+d-1 |/| 

- E ? i+ E E pNkn+(j — Nk)z 

i-Nk(z-n) j—Nk i=z+1 1 

Nk(\I\-n)-l Nk+d—1 \I\—z—l 

- E 

i=Nk(z-n ) j=JVfc i=0 

JVfc(|/|-n)-l JVfe+d-1 |/|-*-l 

_ ^ gi _|_ gNk(z-n) ^ ^ gj(l J |-*)+* 

i—Nk(z—n) j=Nk i =0 

JVfc(|/|-ra)-l (JVfc+d)(|/|-z)-l 

= E 9*+E 

i=Nk(z—n) i=Nk{\I\ — z) 

Nk(\I\—n) — l (Nk+d)\I\-Nkn-dz-l 

= E + E «’ 

i=Nk(z—n ) i=jVfc(|J| — n) 

ATfe(|/|—n) —1 -1 

- E «‘ + E 

i-Nk(z-n) i—Nk(\I\—n) 

-1 


< 


E 9' = +. 

z -—' a — 1 


Q ~ 


where ( k ) is due to (29) and (30), and (l) is due to (28). This completes the proof. 
The following result summarizes the properties of scheme. 


(31) 


Corollary 1. The encoding scheme constructed in Section VTA is a rate-optimal (n, k, r, z) secret sharing 
scheme with high probability. Specifically, the scheme meets the security requirement deterministically, 
and meets the reliability requirement with probability at least 1— { , over the distribution of the random 

matrices R \,..., Rwk+d- The scheme achieves the optimal decoding bandwidth when d nodes participate 
in decoding, universally for all n — r < d < n. 


4 Linearly dependent on the set of columns to the left. 
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Proof: The scheme achieves capacity because k = n — r — z. By Theorem 4, the scheme meets the 
security requirement. By Theorem 5 and the union bound, the scheme meets the reliability requirement 
with probability at least 1 - Y^=o (”) 7^1 > 1 - 

Consider any I such that |/| > n — r. In order to decode, a number of ( Nk + d)\I\ symbols over F g 
are communicated. Therefore the communication overhead is 

_ (Nk + d)\I\-Nk{k + r) 

1 ’ N(k + r ) 

Nkn + dz — Nk(k + r) 

N(k + r) 

_ Nkz + dz _ z(Nk+ Nk {" r| J|) ) 

N(k + r) N(k + r) 

Nk(n — z)z kz 

= ( \I\-z)N(k + r ) = \I\ -z’ 

which achieves equality in (7). ■ 


VII. Conclusions 

In this paper we study the communication efficiency of secret sharing schemes in decoding. We 
prove an information-theoretic lower bound on the amount of information to be communicated during 
decoding, and show that the decoding bandwidth decreases as d, the number of nodes that participate in 
decoding, increases. We prove that the bound is uniformly tight by designing a secret sharing scheme 
that achieves the optimal decoding bandwidth universally for all valid d. The scheme is simple and is 
efficient in both space and computation. We construct another secret sharing scheme that achieves the 
optimal decoding bandwidth when all nodes are available. The scheme has an advantage in implementation 
because its codewords form the Reed-Solomon codes. In the application of distributed storage, the proposed 
communication efficient secret sharing schemes also improve disk access efficiency. There are a number 
of interesting open problems: 1 ) in the application of distributed storage, how can one construct codes 
that are communication efficient in terms of both decoding and repair? 2 ) how to generalize the results 
to other (non-threshold) access structures? and 3) is it possible to extend the schemes and ideas in the 
paper to improve the communication efficiency of other secure protocols that use secret sharing schemes 
as building blocks? 
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